Introduce wait_for_subscription_sync for TAP tests

  • Jump to comment-1
    sawada.mshk@gmail.com2022-07-26T01:36:39+00:00
    Hi, In tap tests for logical replication, we have the following code in many places: $node_publisher->wait_for_catchup('tap_sub'); my $synced_query = "SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT IN ('r', 's');"; $node_subscriber->poll_query_until('postgres', $synced_query) or die "Timed out while waiting for subscriber to synchronize data"; Also, we sometime forgot to check either one, like we fixed in commit 1f50918a6fb02207d151e7cb4aae4c36de9d827c. I think we can have a new function to wait for all subscriptions to synchronize data. The attached patch introduce a new function wait_for_subscription_sync(). With this function, we can replace the above code with this one function as follows: $node_subscriber->wait_for_subscription_sync($node_publisher, 'tap_sub'); Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
    • Jump to comment-1
      amit.kapila16@gmail.com2022-07-26T05:01:15+00:00
      On Tue, Jul 26, 2022 at 7:07 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > Hi, > > In tap tests for logical replication, we have the following code in many places: > > $node_publisher->wait_for_catchup('tap_sub'); > my $synced_query = > "SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT > IN ('r', 's');"; > $node_subscriber->poll_query_until('postgres', $synced_query) > or die "Timed out while waiting for subscriber to synchronize data"; > > Also, we sometime forgot to check either one, like we fixed in commit > 1f50918a6fb02207d151e7cb4aae4c36de9d827c. > > I think we can have a new function to wait for all subscriptions to > synchronize data. The attached patch introduce a new function > wait_for_subscription_sync(). With this function, we can replace the > above code with this one function as follows: > > $node_subscriber->wait_for_subscription_sync($node_publisher, 'tap_sub'); > +1. This reduces quite some code in various tests and will make it easier to write future tests. Few comments/questions: ==================== 1. -$node_publisher->wait_for_catchup('mysub1'); - -# Also wait for initial table sync to finish. -my $synced_query = - "SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT IN ('r', 's');"; -$node_subscriber->poll_query_until('postgres', $synced_query) - or die "Timed out while waiting for subscriber to synchronize data"; - # Also wait for initial table sync to finish. -$node_subscriber->poll_query_until('postgres', $synced_query) - or die "Timed out while waiting for subscriber to synchronize data"; +$node_subscriber->wait_for_subscription_sync($node_publisher, 'mysub1'); It seems to me without your patch there is an extra poll in the above test. If so, we can probably remove that in a separate patch? 2. + # wait for the replication to catchup if required. + if (defined($publisher)) + { + croak 'subscription name must be specified' unless defined($subname); + $publisher->wait_for_catchup($subname, 'replay'); + } + + # then, wait for all table states to be ready. + print "Waiting for all subscriptions in \"$name\" to synchronize data\n"; + my $query = qq[SELECT count(1) = 0 + FROM pg_subscription_rel + WHERE srsubstate NOT IN ('r', 's');]; + $self->poll_query_until($dbname, $query) + or croak "timed out waiting for subscriber to synchronize data"; In the tests, I noticed that a few places did wait_for_catchup after the subscription check, and at other places, we did that check before as you have it here. Ideally, I think wait_for_catchup should be after confirming the initial sync is over as without initial sync, the publisher node won't be completely in sync with the subscriber. What do you think? 3. In the code quoted in the previous point, why did you pass the second parameter as 'replay' when we have not used that in the tests otherwise? -- With Regards, Amit Kapila.
      • Jump to comment-1
        sawada.mshk@gmail.com2022-07-26T07:41:57+00:00
        On Tue, Jul 26, 2022 at 2:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Jul 26, 2022 at 7:07 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > Hi, > > > > In tap tests for logical replication, we have the following code in many places: > > > > $node_publisher->wait_for_catchup('tap_sub'); > > my $synced_query = > > "SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT > > IN ('r', 's');"; > > $node_subscriber->poll_query_until('postgres', $synced_query) > > or die "Timed out while waiting for subscriber to synchronize data"; > > > > Also, we sometime forgot to check either one, like we fixed in commit > > 1f50918a6fb02207d151e7cb4aae4c36de9d827c. > > > > I think we can have a new function to wait for all subscriptions to > > synchronize data. The attached patch introduce a new function > > wait_for_subscription_sync(). With this function, we can replace the > > above code with this one function as follows: > > > > $node_subscriber->wait_for_subscription_sync($node_publisher, 'tap_sub'); > > > > +1. This reduces quite some code in various tests and will make it > easier to write future tests. > > Few comments/questions: > ==================== > 1. > -$node_publisher->wait_for_catchup('mysub1'); > - > -# Also wait for initial table sync to finish. > -my $synced_query = > - "SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT > IN ('r', 's');"; > -$node_subscriber->poll_query_until('postgres', $synced_query) > - or die "Timed out while waiting for subscriber to synchronize data"; > - > # Also wait for initial table sync to finish. > -$node_subscriber->poll_query_until('postgres', $synced_query) > - or die "Timed out while waiting for subscriber to synchronize data"; > +$node_subscriber->wait_for_subscription_sync($node_publisher, 'mysub1'); > > It seems to me without your patch there is an extra poll in the above > test. If so, we can probably remove that in a separate patch? Agreed. > > 2. > + # wait for the replication to catchup if required. > + if (defined($publisher)) > + { > + croak 'subscription name must be specified' unless defined($subname); > + $publisher->wait_for_catchup($subname, 'replay'); > + } > + > + # then, wait for all table states to be ready. > + print "Waiting for all subscriptions in \"$name\" to synchronize data\n"; > + my $query = qq[SELECT count(1) = 0 > + FROM pg_subscription_rel > + WHERE srsubstate NOT IN ('r', 's');]; > + $self->poll_query_until($dbname, $query) > + or croak "timed out waiting for subscriber to synchronize data"; > > In the tests, I noticed that a few places did wait_for_catchup after > the subscription check, and at other places, we did that check before > as you have it here. Ideally, I think wait_for_catchup should be after > confirming the initial sync is over as without initial sync, the > publisher node won't be completely in sync with the subscriber. What do you mean by the last sentence? I thought the order doesn't matter here. Even if we do wait_for_catchup first then the subscription check, we can make sure that the apply worker caught up and table synchronization has been done, no? > > 3. In the code quoted in the previous point, why did you pass the > second parameter as 'replay' when we have not used that in the tests > otherwise? It makes sure to use the (current) default value of $mode of wait_for_catchup(). But probably it's not necessary, I've removed it. I've attached an updated patch as well as a patch to remove duplicated waits in 007_ddl.pl. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
        • Jump to comment-1
          amit.kapila16@gmail.com2022-07-27T11:54:46+00:00
          On Tue, Jul 26, 2022 at 1:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Tue, Jul 26, 2022 at 2:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > 2. > > + # wait for the replication to catchup if required. > > + if (defined($publisher)) > > + { > > + croak 'subscription name must be specified' unless defined($subname); > > + $publisher->wait_for_catchup($subname, 'replay'); > > + } > > + > > + # then, wait for all table states to be ready. > > + print "Waiting for all subscriptions in \"$name\" to synchronize data\n"; > > + my $query = qq[SELECT count(1) = 0 > > + FROM pg_subscription_rel > > + WHERE srsubstate NOT IN ('r', 's');]; > > + $self->poll_query_until($dbname, $query) > > + or croak "timed out waiting for subscriber to synchronize data"; > > > > In the tests, I noticed that a few places did wait_for_catchup after > > the subscription check, and at other places, we did that check before > > as you have it here. Ideally, I think wait_for_catchup should be after > > confirming the initial sync is over as without initial sync, the > > publisher node won't be completely in sync with the subscriber. > > What do you mean by the last sentence? I thought the order doesn't > matter here. Even if we do wait_for_catchup first then the > subscription check, we can make sure that the apply worker caught up > and table synchronization has been done, no? > That's right. I thought we should first ensure the subscriber has finished operations if possible, like in this case, it can ensure table sync has finished and then we can ensure whether publisher and subscriber are in sync. That sounds more logical to me. -- With Regards, Amit Kapila.
          • Jump to comment-1
            sawada.mshk@gmail.com2022-07-28T01:07:36+00:00
            On Wed, Jul 27, 2022 at 8:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Jul 26, 2022 at 1:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Tue, Jul 26, 2022 at 2:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > 2. > > > + # wait for the replication to catchup if required. > > > + if (defined($publisher)) > > > + { > > > + croak 'subscription name must be specified' unless defined($subname); > > > + $publisher->wait_for_catchup($subname, 'replay'); > > > + } > > > + > > > + # then, wait for all table states to be ready. > > > + print "Waiting for all subscriptions in \"$name\" to synchronize data\n"; > > > + my $query = qq[SELECT count(1) = 0 > > > + FROM pg_subscription_rel > > > + WHERE srsubstate NOT IN ('r', 's');]; > > > + $self->poll_query_until($dbname, $query) > > > + or croak "timed out waiting for subscriber to synchronize data"; > > > > > > In the tests, I noticed that a few places did wait_for_catchup after > > > the subscription check, and at other places, we did that check before > > > as you have it here. Ideally, I think wait_for_catchup should be after > > > confirming the initial sync is over as without initial sync, the > > > publisher node won't be completely in sync with the subscriber. > > > > What do you mean by the last sentence? I thought the order doesn't > > matter here. Even if we do wait_for_catchup first then the > > subscription check, we can make sure that the apply worker caught up > > and table synchronization has been done, no? > > > > That's right. I thought we should first ensure the subscriber has > finished operations if possible, like in this case, it can ensure > table sync has finished and then we can ensure whether publisher and > subscriber are in sync. That sounds more logical to me. Make sense. I've incorporated it in the v3 patch. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
        • Jump to comment-1
          shiy.fnst@fujitsu.com2022-07-27T10:08:47+00:00
          On Tue, Jul 26, 2022 3:42 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > I've attached an updated patch as well as a patch to remove duplicated > waits in 007_ddl.pl. > Thanks for your patch. Here are some comments. 1. I think some comments need to be changed in the patch. For example: # Also wait for initial table sync to finish # Wait for initial sync to finish as well Words like "Also" and "as well" can be removed now, we originally used them because we wait for catchup and "also" wait for initial sync. 2. In the following places, we can remove wait_for_catchup() and then call it in wait_for_subscription_sync(). 2.1. 030_origin.pl: @@ -128,8 +120,7 @@ $node_B->safe_psql( $node_C->wait_for_catchup($appname_B2); -$node_B->poll_query_until('postgres', $synced_query) - or die "Timed out while waiting for subscriber to synchronize data"; +$node_B->wait_for_subscription_sync; 2.2. 031_column_list.pl: @@ -385,7 +373,7 @@ $node_subscriber->safe_psql( ALTER SUBSCRIPTION sub1 SET PUBLICATION pub2, pub3 )); -wait_for_subscription_sync($node_subscriber); +$node_subscriber->wait_for_subscription_sync; $node_publisher->wait_for_catchup('sub1'); 2.3. 100_bugs.pl: @@ -281,8 +276,7 @@ $node_subscriber->safe_psql('postgres', $node_publisher->wait_for_catchup('tap_sub'); # Also wait for initial table sync to finish -$node_subscriber->poll_query_until('postgres', $synced_query) - or die "Timed out while waiting for subscriber to synchronize data"; +$node_subscriber->wait_for_subscription_sync; is( $node_subscriber->safe_psql( 'postgres', "SELECT * FROM tab_replidentity_index"), Regards, Shi yu
          • Jump to comment-1
            sawada.mshk@gmail.com2022-07-28T01:06:41+00:00
            On Wed, Jul 27, 2022 at 7:08 PM shiy.fnst@fujitsu.com <shiy.fnst@fujitsu.com> wrote: > > On Tue, Jul 26, 2022 3:42 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > I've attached an updated patch as well as a patch to remove duplicated > > waits in 007_ddl.pl. > > > > Thanks for your patch. Here are some comments. Thank you for the comments! > > 1. > I think some comments need to be changed in the patch. > For example: > # Also wait for initial table sync to finish > # Wait for initial sync to finish as well > > Words like "Also" and "as well" can be removed now, we originally used them > because we wait for catchup and "also" wait for initial sync. Agreed. > > 2. > In the following places, we can remove wait_for_catchup() and then call it in > wait_for_subscription_sync(). > > 2.1. > 030_origin.pl: > @@ -128,8 +120,7 @@ $node_B->safe_psql( > > $node_C->wait_for_catchup($appname_B2); > > -$node_B->poll_query_until('postgres', $synced_query) > - or die "Timed out while waiting for subscriber to synchronize data"; > +$node_B->wait_for_subscription_sync; > > 2.2. > 031_column_list.pl: > @@ -385,7 +373,7 @@ $node_subscriber->safe_psql( > ALTER SUBSCRIPTION sub1 SET PUBLICATION pub2, pub3 > )); > > -wait_for_subscription_sync($node_subscriber); > +$node_subscriber->wait_for_subscription_sync; > > $node_publisher->wait_for_catchup('sub1'); > > 2.3. > 100_bugs.pl: > @@ -281,8 +276,7 @@ $node_subscriber->safe_psql('postgres', > $node_publisher->wait_for_catchup('tap_sub'); > > # Also wait for initial table sync to finish > -$node_subscriber->poll_query_until('postgres', $synced_query) > - or die "Timed out while waiting for subscriber to synchronize data"; > +$node_subscriber->wait_for_subscription_sync; > > is( $node_subscriber->safe_psql( > 'postgres', "SELECT * FROM tab_replidentity_index"), Agreed. I've attached updated patches that incorporated the above comments as well as the comment from Amit. BTW regarding 0001 patch to remove the duplicated wait, should we backpatch to v15? I think we can do that as it's an obvious fix and it seems to be an oversight in 8f2e2bbf145. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
            • Jump to comment-1
              amit.kapila16@gmail.com2022-07-30T06:55:01+00:00
              On Thu, Jul 28, 2022 at 6:37 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Wed, Jul 27, 2022 at 7:08 PM shiy.fnst@fujitsu.com > <shiy.fnst@fujitsu.com> wrote: > > I've attached updated patches that incorporated the above comments as > well as the comment from Amit. > > BTW regarding 0001 patch to remove the duplicated wait, should we > backpatch to v15? > I think it is good to clean this test case even for PG15 even though there is no major harm in keeping it. I'll push this early next week by Tuesday unless someone thinks otherwise. -- With Regards, Amit Kapila.
              • Jump to comment-1
                amit.kapila16@gmail.com2022-08-03T04:51:15+00:00
                On Sat, Jul 30, 2022 at 12:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Jul 28, 2022 at 6:37 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Wed, Jul 27, 2022 at 7:08 PM shiy.fnst@fujitsu.com > > <shiy.fnst@fujitsu.com> wrote: > > > > I've attached updated patches that incorporated the above comments as > > well as the comment from Amit. > > > > BTW regarding 0001 patch to remove the duplicated wait, should we > > backpatch to v15? > > > > I think it is good to clean this test case even for PG15 even though > there is no major harm in keeping it. I'll push this early next week > by Tuesday unless someone thinks otherwise. > Pushed this one and now I'll look at your other patch. -- With Regards, Amit Kapila.
                • Jump to comment-1
                  amit.kapila16@gmail.com2022-08-04T01:37:16+00:00
                  On Wed, Aug 3, 2022 at 10:21 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > Pushed this one and now I'll look at your other patch. > I have pushed the second patch as well after making minor changes in the comments. Alvaro [1] and Tom [2] suggest to back-patch this and they sound reasonable to me. Will you be able to produce back branch patches? [1] - https://www.postgresql.org/message-id/20220803104544.k2luy5hr2ugnhgr2%40alvherre.pgsql [2] - https://www.postgresql.org/message-id/2966703.1659535343%40sss.pgh.pa.us -- With Regards, Amit Kapila.
                  • Jump to comment-1
                    sawada.mshk@gmail.com2022-08-04T06:28:05+00:00
                    On Thu, Aug 4, 2022 at 10:37 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Aug 3, 2022 at 10:21 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > Pushed this one and now I'll look at your other patch. > > > > I have pushed the second patch as well after making minor changes in > the comments. Alvaro [1] and Tom [2] suggest to back-patch this and > they sound reasonable to me. Will you be able to produce back branch > patches? Yes. I've attached patches for backbranches. The updates are straightforward on v11 - v15. However, on v10, we don't use wait_for_catchup() in some logical replication test cases. The commit bbd3363e128dae refactored the tests to use wait_for_catchup but it's not backpatched. So in the patch for v10, I didn't change the code that was changed by the commit. Also, since wait_for_catchup requires to specify $target_lsn, unlike the one in v11 or later, I changed wait_for_subscription_sync() accordingly. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
                    • Jump to comment-1
                      tgl@sss.pgh.pa.us2022-08-04T13:43:26+00:00
                      Masahiko Sawada <sawada.mshk@gmail.com> writes: > Yes. I've attached patches for backbranches. FWIW, I'd recommend waiting till after next week's wrap before pushing these. While I'm definitely in favor of doing this, the odds of introducing a bug are nonzero, so right before a release deadline doesn't seem like a good time. regards, tom lane
                      • Jump to comment-1
                        amit.kapila16@gmail.com2022-08-05T02:57:23+00:00
                        On Thu, Aug 4, 2022 at 7:13 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Masahiko Sawada <sawada.mshk@gmail.com> writes: > > Yes. I've attached patches for backbranches. > > FWIW, I'd recommend waiting till after next week's wrap before > pushing these. While I'm definitely in favor of doing this, > the odds of introducing a bug are nonzero, so right before a > release deadline doesn't seem like a good time. > Agreed. I was planning to do it only after next week's wrap. Thanks for your suggestion. -- With Regards, Amit Kapila.
                    • Jump to comment-1
                      shiy.fnst@fujitsu.com2022-08-04T09:49:15+00:00
                      On Thu, Aug 4, 2022 2:28 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Thu, Aug 4, 2022 at 10:37 AM Amit Kapila <amit.kapila16@gmail.com> > wrote: > > > > On Wed, Aug 3, 2022 at 10:21 AM Amit Kapila <amit.kapila16@gmail.com> > wrote: > > > > > > Pushed this one and now I'll look at your other patch. > > > > > > > I have pushed the second patch as well after making minor changes in > > the comments. Alvaro [1] and Tom [2] suggest to back-patch this and > > they sound reasonable to me. Will you be able to produce back branch > > patches? > > Yes. I've attached patches for backbranches. The updates are > straightforward on v11 - v15. However, on v10, we don't use > wait_for_catchup() in some logical replication test cases. The commit > bbd3363e128dae refactored the tests to use wait_for_catchup but it's > not backpatched. So in the patch for v10, I didn't change the code > that was changed by the commit. Also, since wait_for_catchup requires > to specify $target_lsn, unlike the one in v11 or later, I changed > wait_for_subscription_sync() accordingly. > Thanks for your patches. In the patches for pg11 ~ pg14, it looks we need to add a "=pod" before the current change in PostgresNode.pm. Right? Regards, Shi yu
                      • Jump to comment-1
                        shiy.fnst@fujitsu.com2022-08-05T01:39:49+00:00
                        On Thu, Aug 4, 2022 5:49 PM shiy.fnst@fujitsu.com <shiy.fnst@fujitsu.com> wrote: > > On Thu, Aug 4, 2022 2:28 PM Masahiko Sawada <sawada.mshk@gmail.com> > wrote: > > > > On Thu, Aug 4, 2022 at 10:37 AM Amit Kapila <amit.kapila16@gmail.com> > > wrote: > > > > > > On Wed, Aug 3, 2022 at 10:21 AM Amit Kapila > <amit.kapila16@gmail.com> > > wrote: > > > > > > > > Pushed this one and now I'll look at your other patch. > > > > > > > > > > I have pushed the second patch as well after making minor changes in > > > the comments. Alvaro [1] and Tom [2] suggest to back-patch this and > > > they sound reasonable to me. Will you be able to produce back branch > > > patches? > > > > Yes. I've attached patches for backbranches. The updates are > > straightforward on v11 - v15. However, on v10, we don't use > > wait_for_catchup() in some logical replication test cases. The commit > > bbd3363e128dae refactored the tests to use wait_for_catchup but it's > > not backpatched. So in the patch for v10, I didn't change the code > > that was changed by the commit. Also, since wait_for_catchup requires > > to specify $target_lsn, unlike the one in v11 or later, I changed > > wait_for_subscription_sync() accordingly. > > > > Thanks for your patches. > > In the patches for pg11 ~ pg14, it looks we need to add a "=pod" before the > current change in PostgresNode.pm. Right? > By the way, I notice that in 002_types.pl (on master branch), it seems the "as well" in the following comment should be removed. Is it worth being fixed? $node_subscriber->safe_psql('postgres', "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr' PUBLICATION tap_pub WITH (slot_name = tap_sub_slot)" ); # Wait for initial sync to finish as well $node_subscriber->wait_for_subscription_sync($node_publisher, 'tap_sub'); Regards, Shi yu
                • Jump to comment-1
                  sawada.mshk@gmail.com2022-08-03T06:46:02+00:00
                  On Wed, Aug 3, 2022 at 1:51 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Sat, Jul 30, 2022 at 12:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Thu, Jul 28, 2022 at 6:37 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > On Wed, Jul 27, 2022 at 7:08 PM shiy.fnst@fujitsu.com > > > <shiy.fnst@fujitsu.com> wrote: > > > > > > I've attached updated patches that incorporated the above comments as > > > well as the comment from Amit. > > > > > > BTW regarding 0001 patch to remove the duplicated wait, should we > > > backpatch to v15? > > > > > > > I think it is good to clean this test case even for PG15 even though > > there is no major harm in keeping it. I'll push this early next week > > by Tuesday unless someone thinks otherwise. > > > > Pushed this one and now I'll look at your other patch. Thanks! Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/